{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "93de5736",
   "metadata": {},
   "source": [
    "# Lesson 4: Sentence Embeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9363203",
   "metadata": {},
   "source": [
    "- In the classroom, the libraries are already installed for you.\n",
    "- If you would like to run this code on your own machine, you can install the following:\n",
    "``` \n",
    "    !pip install sentence-transformers\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "632f7aac-6786-4ea6-8fa3-25a6cebbd2e5",
   "metadata": {},
   "source": [
    "- Here is some code that suppresses warning messages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "058015a6-19cf-4f80-940d-f4af86cb589c",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "from transformers.utils import logging\n",
    "logging.set_verbosity_error()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c35a8e72",
   "metadata": {},
   "source": [
    "### Build the `sentence embedding` pipeline using 🤗 Transformers Library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2ed9cec8-803a-4d7e-99d9-c4c84682901c",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "from sentence_transformers import SentenceTransformer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "dd5dbb50-8c36-456c-ac0e-f724429c4b7f",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f477c562ac644656a09572b36f8b78c7",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "32b6a6dd5e2a49889a4232831648d6bb",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "416194046b31421ebb8aa430ee340f81",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "591d7ab058bc4a5ca7b8dd77dc23f663",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "758911bce2bb4365ad73acba30a3423b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a1070e85f4d44bf9932219c997fa6c3a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fbb3e6de5e5640f48ef6fe4188f63943",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7faaf865f1e04dd0bb31a53c42223823",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "1065313dab364207a5250b97ba4b03cd",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "361832cb527140ca928dda8e721bd5bd",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "dbe5bd8b42a54f1d82c4574e369b2cc4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "model = SentenceTransformer(\"all-MiniLM-L6-v2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cad701ed",
   "metadata": {},
   "source": [
    "More info on [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "fe7d23e0-5e68-4537-8dd8-eb125e1a6820",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "sentences1 = ['The cat sits outside',\n",
    "              'A man is playing guitar',\n",
    "              'The movies are awesome']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c33db645-edd8-4a28-a06f-e0fd8200f27f",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "embeddings1 = model.encode(sentences1, convert_to_tensor=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "75de3f4a-bd8e-41d6-847b-9a3a043adeeb",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 0.1392,  0.0030,  0.0470,  ...,  0.0641, -0.0163,  0.0636],\n",
       "        [ 0.0227, -0.0014, -0.0056,  ..., -0.0225,  0.0846, -0.0283],\n",
       "        [-0.1043, -0.0628,  0.0093,  ...,  0.0020,  0.0653, -0.0150]])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "embeddings1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "5136886d-80d4-4a3a-a68e-692c25496b51",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "sentences2 = ['The dog plays in the garden',\n",
    "              'A woman watches TV',\n",
    "              'The new movie is so great']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c7e0c68f",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "embeddings2 = model.encode(sentences2, \n",
    "                           convert_to_tensor=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "8a213124-ea97-4706-bbf4-737490e94244",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 0.0163, -0.0700,  0.0384,  ...,  0.0447,  0.0254, -0.0023],\n",
      "        [ 0.0054, -0.0920,  0.0140,  ...,  0.0167, -0.0086, -0.0424],\n",
      "        [-0.0842, -0.0592, -0.0010,  ..., -0.0157,  0.0764,  0.0389]])\n"
     ]
    }
   ],
   "source": [
    "print(embeddings2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41c3d585",
   "metadata": {},
   "source": [
    "* Calculate the cosine similarity between two sentences as a measure of how similar they are to each other."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "9e3b38a5-9b35-49de-9f85-c62583d6287d",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "from sentence_transformers import util"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "39c1f4f3-94ad-4b5e-a40d-c4ba8277815b",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "cosine_scores = util.cos_sim(embeddings1,embeddings2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "b6859d46-15a7-4f61-8a9f-06a15baeff40",
   "metadata": {
    "height": 30
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 0.2838,  0.1310, -0.0029],\n",
      "        [ 0.2277, -0.0327, -0.0136],\n",
      "        [-0.0124, -0.0465,  0.6571]])\n"
     ]
    }
   ],
   "source": [
    "print(cosine_scores)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "fae8571e-2dea-4872-b244-342731b949de",
   "metadata": {
    "height": 94
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The cat sits outside \t\t The dog plays in the garden \t\t Score: 0.2838\n",
      "A man is playing guitar \t\t A woman watches TV \t\t Score: -0.0327\n",
      "The movies are awesome \t\t The new movie is so great \t\t Score: 0.6571\n"
     ]
    }
   ],
   "source": [
    "for i in range(len(sentences1)):\n",
    "    print(\"{} \\t\\t {} \\t\\t Score: {:.4f}\".format(sentences1[i],\n",
    "                                                 sentences2[i],\n",
    "                                                 cosine_scores[i][i]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49863f4c",
   "metadata": {},
   "source": [
    "### Try it yourself! \n",
    "- Try this model with your own sentences!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
